Code
```{r}
library(tidyverse)
library(gt)
theme_set(theme_light())
```Case Study: Stratton AE Banking
Stratton AE-Banking is a newly founded online bank in the US market. The E-banking service is a joint venture of a young fintech start-up and the long-time standing New York Stratton & Fils private banking house. The joint venture was founded in 2020 and has since then enjoyed great interest by providing digital private banking services. It profits from an AI driven recommender engine that uses past investment information together with a market and finance machine learning engine to derive investment tips and portfolio suggestions for its customers. So far, the fintech startup was well able to successfully approach young investors and customers. After the joint venture with Stratton & Fils, the fintech hopes to also attract existing customers from the established bank.
However, the conservative bank management of Stratton & Fils is extremely worried about simply approaching all of its customers, as it fears that the data driven and digital customer experience of Stratton AE may disturb some of its long-standing customers and may harm the longtime established and very intimate customer relations, which are believed to be an essential success factor in the bank’s success history.
The management thus approaches you as the head of the data science team and asks you to conduct a segmentation analysis of the bank’s existing customer base and to identify suitable customer segments, which might be open to try out Stratton & Fils joint venture. As a base for your segmentation analysis, the CRM manager provides you with the following data.
```{r}
library(tidyverse)
library(gt)
theme_set(theme_light())
```| Table 8.1: Logi.Tude’s CRM Data | ||
|---|---|---|
| Variable | Description | Measurement |
| Age | Customer Age | Age in Years |
| Income | Household Net Income | Net Income in USD |
| HouseholdSize | Number of People Living in Household | Integer number |
| CityAreaSize | City or Main Area Population | Integer number |
| MeanCityIncome | Average Income on ZIP-Code and Street Level | Average Income in USD |
| MeanCityHousePrize | Average House Prizes on ZIP-Code and Street Level from last 5 years | Average Prizes in USD |
| MeanCityHouseholdSize | Average Household Size on ZIP-Code and Street Level from last 10 years | Average Number Inhabitants |
| MeanCitySqFtPrice | Average Prizes per Square Foot on ZIP-Code and Street Level | Yes/No |
| NumbCars | Number of registered cars of customer | Number of Cars |
| InternetTrafficVolume | Volume of Internet Traffic per customer household | GB |
| MortageVolume | Mortage to be paid by Customer | USD |
| AccountSpending | Monthly average spending from bank account | USD |
| CreditCardSpending | Monthly average spending from Credit Card | USD |
| HelpHotlineTime | Number of Minutes with Banking Hotline | Minutes |
| CustomerSince | Time since opening bank account | Months |
| GrocerySpending | Average grocery related spendings from bank account | USD |
| StockVolume | Stock Investment | USD |
| CreditVolume | Credits with the bank | USD |
| NASDAQInvest | Amount of money invested in NASDAQ listed companies | USD |
| USAXSFundInvest | Amount of money invested in Stratton owned share fund for mid sized US companies | USD |
| BranchVisits | Number of recorded branch visits within the last 8 weeks | Integer number |
| AppLogins | Number of customer logins in mobile banking app within the last 8 weeks | Integer number |
| ATMVisitis | Number of times customer used an ATM service point within the last 8 weeks | Integer number |
| TimeOnlineBanking | Time logged into the Online Banking System | Minutes |
| ServiceFees | Extra Fees paid for banking services | USD |
| SocialMediaInter | Number of Finance Specific Social Media Profiles a customer follows | Integer number |
| Bitcoins | Number of Bitcoins hold by customer | Number |
| NFT | Number of NFTs bought by customer | Integer number |
We can now load the data in R with the read_csv command and then inspect the dataframe with the str() command.
#Import Data
BankinCRMData <- read_csv("Data/StrattonAEBankingCRM.csv")
summary(BankinCRMData) Age Income HouseholdSize CityAreaSize
Min. :18.00 Min. : 35202 Min. :1.00 Min. : 61613
1st Qu.:23.00 1st Qu.: 42803 1st Qu.:2.00 1st Qu.:121704
Median :30.00 Median : 71268 Median :3.00 Median :450100
Mean :35.16 Mean : 84700 Mean :2.81 Mean :372196
3rd Qu.:45.00 3rd Qu.:125870 3rd Qu.:4.00 3rd Qu.:459418
Max. :74.00 Max. :181863 Max. :8.00 Max. :708729
MeanCityIncome MeanCityHousePrize MeanCityHouseHoldSize MeanCitySqFtPrice
Min. : 35372 Min. : 125011 Min. :1.000 Min. :1871
1st Qu.:116253 1st Qu.: 444817 1st Qu.:2.000 1st Qu.:2627
Median :140458 Median : 614601 Median :3.000 Median :5778
Mean :163706 Mean : 942505 Mean :3.023 Mean :5318
3rd Qu.:235000 3rd Qu.:1849915 3rd Qu.:4.000 3rd Qu.:6741
Max. :286996 Max. :1850000 Max. :8.000 Max. :9886
NumberCars InternetTrafficVolume MortageVolume AccountSpending
Min. :0.000 Min. : 6.00 Min. : 14898 Min. : 500.0
1st Qu.:1.000 1st Qu.: 45.00 1st Qu.:120462 1st Qu.: 560.1
Median :1.000 Median : 60.00 Median :232414 Median : 898.0
Mean :1.384 Mean : 67.57 Mean :202824 Mean :1275.7
3rd Qu.:2.000 3rd Qu.: 86.00 3rd Qu.:287298 3rd Qu.:1647.6
Max. :4.000 Max. :118.00 Max. :605846 Max. :4257.1
CreditCardSpending HelpHotlineTime CustomerSince GrocerySpending
Min. : 501.1 Min. : 0.006058 Min. : 0.00 Min. : 150.1
1st Qu.: 651.1 1st Qu.: 4.577513 1st Qu.: 3.00 1st Qu.: 293.4
Median : 785.7 Median : 8.774480 Median :11.00 Median : 426.5
Mean :1013.3 Mean :12.816409 Mean :19.25 Mean : 535.8
3rd Qu.:1451.4 3rd Qu.:16.593896 3rd Qu.:36.00 3rd Qu.: 627.9
Max. :2041.9 Max. :60.754994 Max. :74.00 Max. :1253.5
StockVolume CreditVolume NASDAQInvest USAXSFundInvest
Min. : 388 Min. : 117.3 Min. : 228.4 Min. : 69.95
1st Qu.:1059 1st Qu.: 161.7 1st Qu.: 401.4 1st Qu.: 149.80
Median :1537 Median : 802.7 Median :1498.1 Median : 313.82
Mean :2142 Mean :1330.1 Mean :1828.5 Mean : 761.65
3rd Qu.:2505 3rd Qu.:2488.0 3rd Qu.:3056.0 3rd Qu.:1060.23
Max. :5738 Max. :3532.3 Max. :4532.4 Max. :3396.61
BranchVisits AppLogins ATMVisits TimeOnlineBanking
Min. : 0.000 Min. : 1.0 Min. : 0.000 Min. : 22.77
1st Qu.: 2.000 1st Qu.: 18.0 1st Qu.: 3.000 1st Qu.: 69.28
Median : 3.000 Median : 64.0 Median : 5.000 Median : 88.26
Mean : 3.913 Mean : 55.7 Mean : 4.928 Mean :113.88
3rd Qu.: 5.000 3rd Qu.: 82.0 3rd Qu.: 7.000 3rd Qu.:152.92
Max. :20.000 Max. :130.0 Max. :11.000 Max. :232.21
ServiceFees SocialMediaInter Bitcoins NFTs
Min. : 0.1343 Min. : 0.00 Min. :0.0000 Min. : 0.000
1st Qu.: 17.8442 1st Qu.: 5.00 1st Qu.:0.0005 1st Qu.: 1.000
Median : 27.2386 Median :16.00 Median :0.0998 Median : 3.000
Mean : 40.9382 Mean :19.03 Mean :0.1937 Mean : 3.317
3rd Qu.: 50.1652 3rd Qu.:31.00 3rd Qu.:0.4005 3rd Qu.: 4.000
Max. :124.2613 Max. :60.00 Max. :0.6014 Max. :12.000
str(BankinCRMData)spc_tbl_ [10,750 × 28] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Age : num [1:10750] 40 37 71 53 40 32 72 29 55 38 ...
$ Income : num [1:10750] 79623 71616 78524 69938 74244 ...
$ HouseholdSize : num [1:10750] 2 5 1 3 1 7 5 3 2 3 ...
$ CityAreaSize : num [1:10750] 454686 452465 456594 456594 452004 ...
$ MeanCityIncome : num [1:10750] 90668 156742 52484 118422 36227 ...
$ MeanCityHousePrize : num [1:10750] 1849978 1849599 1849953 1849302 1849247 ...
$ MeanCityHouseHoldSize: num [1:10750] 3 5 3 4 6 3 2 6 3 2 ...
$ MeanCitySqFtPrice : num [1:10750] 3813 5264 3405 2141 2160 ...
$ NumberCars : num [1:10750] 2 2 1 0 1 1 2 3 3 0 ...
$ InternetTrafficVolume: num [1:10750] 58 36 57 71 39 52 62 41 65 67 ...
$ MortageVolume : num [1:10750] 430299 378228 282232 394235 350471 ...
$ AccountSpending : num [1:10750] 938 1128 931 1002 1171 ...
$ CreditCardSpending : num [1:10750] 1418 694 1281 1135 1109 ...
$ HelpHotlineTime : num [1:10750] 3.22 3.44 2.47 5.9 5.1 ...
$ CustomerSince : num [1:10750] 36 36 36 36 37 36 37 36 36 36 ...
$ GrocerySpending : num [1:10750] 433 594 574 561 309 ...
$ StockVolume : num [1:10750] 1118 1392 1117 1354 1037 ...
$ CreditVolume : num [1:10750] 809 803 791 780 804 ...
$ NASDAQInvest : num [1:10750] 1488 1504 1500 1496 1500 ...
$ USAXSFundInvest : num [1:10750] 476 490 465 488 450 ...
$ BranchVisits : num [1:10750] 3 4 3 4 3 4 3 3 4 4 ...
$ AppLogins : num [1:10750] 10 19 14 14 16 13 11 16 20 8 ...
$ ATMVisits : num [1:10750] 9 8 8 8 9 8 9 7 8 7 ...
$ TimeOnlineBanking : num [1:10750] 71.5 67.1 58.9 61 67.2 ...
$ ServiceFees : num [1:10750] 41.8 52.2 54.6 41.8 57.1 ...
$ SocialMediaInter : num [1:10750] 27 25 28 34 34 24 21 32 28 24 ...
$ Bitcoins : num [1:10750] 0.0032 0.0037 0.0136 0.0016 0.0075 0.003 0.0076 0.0055 0.0036 0.0026 ...
$ NFTs : num [1:10750] 2 1 1 1 0 3 2 4 3 1 ...
- attr(*, "spec")=
.. cols(
.. Age = col_double(),
.. Income = col_double(),
.. HouseholdSize = col_double(),
.. CityAreaSize = col_double(),
.. MeanCityIncome = col_double(),
.. MeanCityHousePrize = col_double(),
.. MeanCityHouseHoldSize = col_double(),
.. MeanCitySqFtPrice = col_double(),
.. NumberCars = col_double(),
.. InternetTrafficVolume = col_double(),
.. MortageVolume = col_double(),
.. AccountSpending = col_double(),
.. CreditCardSpending = col_double(),
.. HelpHotlineTime = col_double(),
.. CustomerSince = col_double(),
.. GrocerySpending = col_double(),
.. StockVolume = col_double(),
.. CreditVolume = col_double(),
.. NASDAQInvest = col_double(),
.. USAXSFundInvest = col_double(),
.. BranchVisits = col_double(),
.. AppLogins = col_double(),
.. ATMVisits = col_double(),
.. TimeOnlineBanking = col_double(),
.. ServiceFees = col_double(),
.. SocialMediaInter = col_double(),
.. Bitcoins = col_double(),
.. NFTs = col_double()
.. )
- attr(*, "problems")=<externalptr>
skimr::skim(BankinCRMData)| Name | BankinCRMData |
| Number of rows | 10750 |
| Number of columns | 28 |
| _______________________ | |
| Column type frequency: | |
| numeric | 28 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Age | 0 | 1 | 35.16 | 13.54 | 18.00 | 23.00 | 30.00 | 45.00 | 74.00 | ▇▃▂▃▁ |
| Income | 0 | 1 | 84699.60 | 49337.11 | 35202.00 | 42803.25 | 71268.00 | 125870.00 | 181863.00 | ▇▇▁▂▃ |
| HouseholdSize | 0 | 1 | 2.81 | 1.33 | 1.00 | 2.00 | 3.00 | 4.00 | 8.00 | ▇▅▃▁▁ |
| CityAreaSize | 0 | 1 | 372196.13 | 228010.19 | 61613.00 | 121703.75 | 450100.00 | 459418.00 | 708729.00 | ▇▂▁▇▅ |
| MeanCityIncome | 0 | 1 | 163706.02 | 64581.64 | 35372.00 | 116252.75 | 140457.50 | 235000.00 | 286996.00 | ▂▅▇▃▆ |
| MeanCityHousePrize | 0 | 1 | 942505.00 | 713686.15 | 125011.00 | 444816.50 | 614600.50 | 1849915.00 | 1850000.00 | ▇▆▁▁▇ |
| MeanCityHouseHoldSize | 0 | 1 | 3.02 | 1.34 | 1.00 | 2.00 | 3.00 | 4.00 | 8.00 | ▇▅▅▁▁ |
| MeanCitySqFtPrice | 0 | 1 | 5318.27 | 2394.52 | 1871.00 | 2627.00 | 5778.50 | 6741.00 | 9886.00 | ▇▃▇▂▅ |
| NumberCars | 0 | 1 | 1.38 | 0.93 | 0.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▂▇▃▂▁ |
| InternetTrafficVolume | 0 | 1 | 67.57 | 33.10 | 6.00 | 45.00 | 60.00 | 86.00 | 118.00 | ▅▆▆▇▇ |
| MortageVolume | 0 | 1 | 202823.81 | 138105.98 | 14898.00 | 120461.75 | 232413.50 | 287297.50 | 605846.00 | ▇▇▅▃▁ |
| AccountSpending | 0 | 1 | 1275.73 | 978.17 | 500.00 | 560.14 | 897.96 | 1647.60 | 4257.14 | ▇▂▁▁▁ |
| CreditCardSpending | 0 | 1 | 1013.33 | 469.02 | 501.11 | 651.08 | 785.67 | 1451.37 | 2041.85 | ▇▂▁▂▂ |
| HelpHotlineTime | 0 | 1 | 12.82 | 11.12 | 0.01 | 4.58 | 8.77 | 16.59 | 60.75 | ▇▂▂▁▁ |
| CustomerSince | 0 | 1 | 19.25 | 21.68 | 0.00 | 3.00 | 11.00 | 36.00 | 74.00 | ▇▁▂▁▂ |
| GrocerySpending | 0 | 1 | 535.81 | 300.88 | 150.10 | 293.43 | 426.48 | 627.91 | 1253.50 | ▇▅▅▁▃ |
| StockVolume | 0 | 1 | 2142.25 | 1496.22 | 387.97 | 1058.73 | 1536.71 | 2505.16 | 5738.39 | ▇▅▂▁▂ |
| CreditVolume | 0 | 1 | 1330.12 | 1196.72 | 117.30 | 161.72 | 802.75 | 2487.98 | 3532.34 | ▇▂▂▃▂ |
| NASDAQInvest | 0 | 1 | 1828.48 | 1447.34 | 228.35 | 401.40 | 1498.10 | 3056.05 | 4532.36 | ▇▆▁▅▂ |
| USAXSFundInvest | 0 | 1 | 761.65 | 836.46 | 69.95 | 149.80 | 313.82 | 1060.23 | 3396.61 | ▇▂▂▁▁ |
| BranchVisits | 0 | 1 | 3.91 | 3.22 | 0.00 | 2.00 | 3.00 | 5.00 | 20.00 | ▇▂▁▁▁ |
| AppLogins | 0 | 1 | 55.70 | 34.64 | 1.00 | 18.00 | 64.00 | 82.00 | 130.00 | ▇▃▇▃▃ |
| ATMVisits | 0 | 1 | 4.93 | 2.32 | 0.00 | 3.00 | 5.00 | 7.00 | 11.00 | ▅▇▅▇▁ |
| TimeOnlineBanking | 0 | 1 | 113.88 | 59.18 | 22.77 | 69.28 | 88.26 | 152.92 | 232.21 | ▅▇▅▅▃ |
| ServiceFees | 0 | 1 | 40.94 | 31.09 | 0.13 | 17.84 | 27.24 | 50.17 | 124.26 | ▇▆▂▁▂ |
| SocialMediaInter | 0 | 1 | 19.03 | 17.05 | 0.00 | 5.00 | 16.00 | 31.00 | 60.00 | ▇▅▃▁▂ |
| Bitcoins | 0 | 1 | 0.19 | 0.22 | 0.00 | 0.00 | 0.10 | 0.40 | 0.60 | ▇▁▁▂▂ |
| NFTs | 0 | 1 | 3.32 | 2.69 | 0.00 | 1.00 | 3.00 | 4.00 | 12.00 | ▇▆▂▂▁ |
To identify segments of similar customers, let us first focus on the question about how to measure similarity. Table 2 shows us some observations for customers from another banking database. The columns show the values of some customer-related attributes. We can use the individual attribute characteristics to now calculate a so called distance measure, which shows how similar or dissimilar customers are. The higher the distance, the more dissimilar they are. For continuous variables, we can use the basic Euclidean Distance measure to derive similarities. The Euclidean Distance between two customers A and B can be expressed by the following equation.
\[ED_{A,B}= \sqrt{(f_{1,A}-f_{1,B})^{2}+(f_{2,A}-f_{2,B})^{2}+...+(f_{n,A}-f_{n,B})^{2}} \]
Table2_1We can now use the formula of the Euclidean Distance to calculate, for example, the distance between Hawkeye and Potter.
ED_Hawkeye_Potter = sqrt((32-64)^2 + (45-75)^2+ (25-10)^2 + (1-3)^2)
ED_Hawkeye_Potter[1] 46.40043
Repeat the calculations for Hawkeye and Burns as well as Hawkeye and Hotlips. What can you tell about the distances between the persons?
distance functionWhile this is a great exercise, it will be impossible to calculate the distances amongst all members of a large customer data base with e.g. 200,000 entries. However in this case we can also use R’s function for Euclidean Distances. We simply need to give the function a data frame with all observations we would like to compare, and R will return a table with the corresponding distances.
library(philentropy)
distance(Table2_1[,2:5], method = "euclidean") v1 v2 v3 v4 v5
v1 0.000 33541.03 5830.978 32526.912 34669.872
v2 33541.035 0.00 34481.883 53600.382 59135.448
v3 5830.978 34481.88 0.000 26907.253 29529.653
v4 32526.912 53600.38 26907.253 0.000 7211.104
v5 34669.872 59135.45 29529.653 7211.104 0.000
k-mean as a solution to form homogenous subgroupsWhile the distances help us with understanding similarities and dissimilarities they do not yet help us with forming subgroups, ad only from the distances, you do not know which threshold determines similarity/dissimilarity. Hawkeye may be closest to Burns, but is 18 still a great distance? Or actually already pretty similar? Who should be paired with whom?
This implies that grouping consumers in homogenous subgroups requires a lot of attention and balance and some more information than just similarity measures. In addition, we realized with our 5 customers, that grouping takes us some time and effort and will certainly prevent us from forming larger groups or segmenting larger data sets with hundreds of thousands of customers. Therefore, it is time to discover a method that uses intersubject distances to automatically form groups. Such methods are commonly referred to as cluster analysis. Cluster analysis are a well-known and established statistical method, that is used for the last 30-40 years in marketing research. With the advent of machine learning and artificial intelligence applications, cluster analysis became again popular in data science, where it is often referred to as an unsupervised learning algorithm.
Watch this video about how K-means clustering works.
The algorithm stops once no observation can be re-assigned to another cluster or after a specified number of iterations.
One thing we may mind before running a cluster analysis is scale heterogeneity. Especially, k-means clustering is sensitive to data that comes at different scale levels. Having variables at very different levels, thus, creates problems, which may ultimately lead to biased results. A quick fix is to standardize the variables so that they all share a similar range. This procedure is commonly referred to as standardization.
R can standardize all variables for us with the help of the scale() function. When we now inspect the resulting new data frame scaled.crm with the head() function.
scaled.crm = scale(BankinCRMData, center = TRUE, scale = TRUE) # Z score
head(scaled.crm) Age Income HouseholdSize CityAreaSize MeanCityIncome
[1,] 0.3576515 -0.1028962 -0.6110203 0.3617815 -1.1309409
[2,] 0.1361200 -0.2651878 1.6529279 0.3520407 -0.1078328
[3,] 2.6468104 -0.1251715 -1.3656697 0.3701496 -1.7221925
[4,] 1.3176214 -0.2991988 0.1436291 0.3701496 -0.7011903
[5,] 0.3576515 -0.2119217 -1.3656697 0.3500189 -1.9739204
[6,] -0.2330992 -0.1816402 3.1622266 0.3681716 -1.1212168
MeanCityHousePrize MeanCityHouseHoldSize MeanCitySqFtPrice NumberCars
[1,] 1.271530 -0.01716176 -0.62863020 0.6657393
[2,] 1.270998 1.47667561 -0.02266382 0.6657393
[3,] 1.271494 -0.01716176 -0.79901909 -0.4146800
[4,] 1.270582 0.72975692 -1.32689056 -1.4950993
[5,] 1.270505 2.22359429 -1.31895578 -0.4146800
[6,] 1.271217 -0.01716176 -0.26404808 -0.4146800
InternetTrafficVolume MortageVolume AccountSpending CreditCardSpending
[1,] -0.2889997 1.6471061 -0.3450280 0.8636732
[2,] -0.9537126 1.2700695 -0.1509407 -0.6818782
[3,] -0.3192140 0.5749801 -0.3528008 0.5714726
[4,] 0.1037852 1.3859733 -0.2801354 0.2591261
[5,] -0.8630700 1.0690862 -0.1072321 0.2034089
[6,] -0.4702851 0.9330313 -0.3513856 -0.1526760
HelpHotlineTime CustomerSince GrocerySpending StockVolume CreditVolume
[1,] -0.8632445 0.7723802 -0.34104672 -0.6842514 -0.4354132
[2,] -0.8427150 0.7723802 0.19341832 -0.5016177 -0.4407973
[3,] -0.9302276 0.7723802 0.12646708 -0.6849314 -0.4507881
[4,] -0.6214784 0.7723802 0.08311691 -0.5265523 -0.4594416
[5,] -0.6937764 0.8184972 -0.75375557 -0.7384732 -0.4396339
[6,] -0.5917007 0.7723802 0.09847517 -0.3919688 -0.4472727
NASDAQInvest USAXSFundInvest BranchVisits AppLogins ATMVisits
[1,] -0.2353360 -0.3417909 -0.28345927 -1.319352 1.751956
[2,] -0.2240967 -0.3244452 0.02716123 -1.059550 1.321701
[3,] -0.2269506 -0.3549125 -0.28345927 -1.203885 1.321701
[4,] -0.2294082 -0.3270950 0.02716123 -1.203885 1.321701
[5,] -0.2268875 -0.3721992 -0.28345927 -1.146151 1.751956
[6,] -0.2259212 -0.2313141 0.02716123 -1.232751 1.321701
TimeOnlineBanking ServiceFees SocialMediaInter Bitcoins NFTs
[1,] -0.7152344 0.02878044 0.4675133 -0.8542812 -0.4895549
[2,] -0.7897758 0.36121889 0.3502121 -0.8520388 -0.8611631
[3,] -0.9289516 0.44046295 0.5261639 -0.8076386 -0.8611631
[4,] -0.8941460 0.02794600 0.8780673 -0.8614570 -0.8611631
[5,] -0.7894912 0.52092062 0.8780673 -0.8349963 -1.2327713
[6,] -0.9766564 0.51830595 0.2915616 -0.8551782 -0.1179467
As you see, all variables now range in similar areas. We can thus proceed with our analysis.
We can now start with the cluster analysis. Let us first try out different solutions with different numbers of clusters. To ensure that we start with the same centroids, we use the set.seed function. This ensures that every time we run this code, we obtain the same results. If you do not use set.seed ahead of the cluster analysis, you will receive different solutions, which will be close to each other but not identical. We can run a k-means cluster analysis with R’s kmeans function. We tell the kmeans function which data frame contains our customer data and specify the number of clusters we want to be included. Here we set k to 4.
The algorithm of Hartigan and Wong (1979) is used by default.
Note that some authors use k-means to refer to a specific algorithm rather than the general method: most commonly the algorithm given by MacQueen (1967) but sometimes that given by Lloyd (1957) and Forgy (1965). The Hartigan–Wong algorithm generally does a better job than either of those, but trying several random starts (nstart>1) is often recommended. In rare cases, when some of the points (rows of x) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ifault = 4). Slight rounding of the data may be advisable in that case.
```{r}
set.seed(123)
StrattonCluster_4k <- kmeans(scaled.crm, 4)
StrattonCluster_4k
StrattonCluster_4k[["size"]]
sizes4k <- data.frame(Size = StrattonCluster_4k[["size"]],
Cluster = c("Cluster1", "Cluster2", "Cluster3", "Cluster4"))
```K-means clustering with 4 clusters of sizes 1000, 1250, 2996, 5504
Cluster means:
Age Income HouseholdSize CityAreaSize MeanCityIncome
1 0.4957394 -0.3386598 0.47642946 0.3414837 -0.8350901
2 0.1625265 -0.1976437 -0.52529213 0.3582921 1.1039357
3 0.6297842 1.4843069 0.12524143 -0.9306195 -0.7429876
4 -0.4697913 -0.7015387 -0.03543561 0.3631517 0.3054436
MeanCityHousePrize MeanCityHouseHoldSize MeanCitySqFtPrice NumberCars
1 -1.1362805 0.837313214 -1.26447602 -0.03113115
2 -0.6899384 -0.750337139 0.36552584 -1.10441967
3 -0.5157368 -0.006690937 0.10248511 0.81828180
4 0.6438683 0.021921195 0.09093811 -0.18893832
InternetTrafficVolume MortageVolume AccountSpending CreditCardSpending
1 1.4329995 -0.5650106 0.02116941 -0.56003954
2 1.4337186 0.3434931 -0.48984671 -0.98809483
3 -0.9842698 0.5143171 1.33048187 0.73046345
4 -0.0501954 -0.2553143 -0.61682135 -0.07145901
HelpHotlineTime CustomerSince GrocerySpending StockVolume CreditVolume
1 -0.4072184 -0.3120147 0.3798751 0.9081969 0.14255359
2 -0.2284210 -0.7718011 -0.3785228 2.2435110 1.81347551
3 1.1635969 1.0279632 1.2118918 0.1702179 -0.90296052
4 -0.5075203 -0.3275821 -0.6427233 -0.7671800 0.05375577
NASDAQInvest USAXSFundInvest BranchVisits AppLogins ATMVisits
1 1.1548262 2.5528726 -0.8593497 0.9937584 0.9357632
2 1.8457024 0.3440906 -0.7150975 1.7089551 -0.7479942
3 0.2932005 -0.6714841 1.1815166 -0.3830160 -0.9267941
4 -0.7885870 -0.1764570 -0.3246007 -0.3601810 0.5043431
TimeOnlineBanking ServiceFees SocialMediaInter Bitcoins NFTs
1 1.0350606 1.1048109 0.9562486 1.82244896 1.1722769
2 1.8769642 2.2202269 2.1344096 1.37359382 2.1134861
3 -0.4039396 -0.3502199 -0.8524771 0.02738211 -0.4874463
4 -0.3944518 -0.5143233 -0.1944475 -0.65797203 -0.4276427
Clustering vector:
[1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[37] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[73] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[109] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[145] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[181] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[217] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[253] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[289] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[325] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[361] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[397] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[433] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[469] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[505] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[541] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[577] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[613] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[649] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[685] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[721] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[757] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[793] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[829] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[865] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[901] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[937] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[973] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1009] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1045] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1081] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1117] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1153] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1189] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1225] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1261] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1297] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1333] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1369] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1405] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1441] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[1477] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1
[1513] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1549] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1585] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1621] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1657] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1693] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1729] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1765] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1801] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1837] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1873] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1909] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1945] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1981] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2017] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2053] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2089] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2125] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2161] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2197] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2233] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2269] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2305] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2341] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2377] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2413] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2449] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2485] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2521] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2557] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2593] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2629] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2665] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2701] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2737] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2773] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2809] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2845] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2881] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2917] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2953] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2989] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3025] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3061] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3097] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3169] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3205] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3241] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3277] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3313] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3349] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3385] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3421] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3457] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3493] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3529] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3565] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3601] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3637] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3673] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3709] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[3745] 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[3781] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[3817] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[3853] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[3889] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[3925] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[3961] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[3997] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4033] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4069] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4105] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4177] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4213] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4249] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4285] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4321] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4357] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4393] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4429] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4465] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4501] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4537] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4573] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4609] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4645] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4681] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4717] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4753] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4789] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4825] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4861] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4897] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4933] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[4969] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[5005] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[5041] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[5077] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[5113] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[5149] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[5185] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[5221] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
[5257] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5293] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5329] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5365] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5401] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5437] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5473] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5509] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5545] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5581] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5617] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5653] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5689] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5725] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5761] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5797] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5833] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5869] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5905] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5941] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[5977] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6013] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6049] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6085] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6121] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6157] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6193] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6229] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6265] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6301] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6337] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6373] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6409] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6445] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6481] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6517] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6553] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6589] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6625] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6661] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6697] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6733] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6769] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6805] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6841] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6877] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6913] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6949] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[6985] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7021] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7057] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7093] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7129] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7165] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7201] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7237] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7273] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7309] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7345] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7381] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7417] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7453] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7489] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7525] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7561] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7597] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7633] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7669] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7705] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7741] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7777] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7813] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7849] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7885] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7921] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7957] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[7993] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8029] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8065] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8101] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8137] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8173] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8209] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8245] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8281] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8317] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8353] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8389] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8425] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8461] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8497] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8533] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8569] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8605] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8641] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8677] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8713] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8749] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8785] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8821] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8857] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8893] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8929] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[8965] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[9001] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[9037] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[9073] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[9109] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[9145] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[9181] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[9217] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3
[9253] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9289] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9325] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9361] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9397] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3
[9433] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 3
[9469] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9505] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9541] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9577] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9613] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9649] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9685] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9721] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9757] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9793] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9829] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9865] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9901] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9937] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[9973] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10009] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10045] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10081] 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10117] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10153] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10189] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10225] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10261] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10297] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10333] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10369] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10405] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10441] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10477] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10513] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10549] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10585] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10621] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 3 3
[10657] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10693] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[10729] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Within cluster sum of squares by cluster:
[1] 3850.269 1289.676 50440.263 81630.942
(between_SS / total_SS = 54.4 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
[1] 1000 1250 2996 5504
To visualize the number of customers assigned to each cluster, we also plot the cluster sizes using ggplot and a simple bar chart.
sizes4k |>
ggplot(aes(x=factor(Cluster), y=Size)) +
geom_col(fill=hcl(195, 100, 65)) +
geom_text(aes(label=Size), vjust=0) +
xlab("Cluster") +
ylab("Size") +
ggtitle("Cluster sizes for k-means 4-cluster solution")We can now inspect the different clusters and check their mean values. We achieve this with the following code, which first matches the estimated cluster to each observation in our data frame. Subsequently, we use dyplr’s group_by command to calculate the mean of each variable per cluster. You can then inspect the resulting data frame. You will notice that some clusters exhibit substantially different mean values for specific variables, whereas in other cases, the means do not vary across the clusters.
#Build Cluster Specific Means for all Variables
BankinCRMData$k4Cluster = StrattonCluster_4k[["cluster"]]
BankinCRMDataBankinCRMData.means.percluster_4k = BankinCRMData %>%
group_by(k4Cluster) %>%
summarise_if(is.numeric, mean, na.rm = TRUE)
BankinCRMData.means.percluster_4k |>
t() |>
round(0) [,1] [,2] [,3] [,4]
k4Cluster 1 2 3 4
Age 42 37 44 29
Income 67991 74948 157931 50088
HouseholdSize 3 2 3 3
CityAreaSize 450058 453890 160005 454998
MeanCityIncome 109775 235000 115723 183432
MeanCityHousePrize 131557 450106 574431 1402025
MeanCityHouseHoldSize 4 2 3 3
MeanCitySqFtPrice 2290 6194 5564 5536
NumberCars 1 0 2 1
InternetTrafficVolume 115 115 35 66
MortageVolume 124792 250262 273854 167563
AccountSpending 1296 797 2577 672
CreditCardSpending 751 550 1356 980
HelpHotlineTime 8 10 26 7
CustomerSince 12 3 42 12
GrocerySpending 650 422 900 342
StockVolume 3501 5499 2397 994
CreditVolume 1501 3500 250 1394
NASDAQInvest 3500 4500 2253 687
USAXSFundInvest 2897 1049 200 614
BranchVisits 1 2 8 3
AppLogins 90 115 42 43
ATMVisits 7 3 3 6
TimeOnlineBanking 175 225 90 91
ServiceFees 75 110 30 25
SocialMediaInter 35 55 4 16
Bitcoins 1 0 0 0
NFTs 6 9 2 2
Transpose the data and reformat it to increase readability. Then Assess the quality of the 4 cluster solution.
Another approach to assess the quality of our segmentation is to plot the different clusters. A key challenge here is dimensionality. Given that our clusters depend on a multitude of variables, we cannot plot them all together. To arrive at a solution that we can plot, we need to reduce the dimensions to two main factors, which then allows us to plot the points in a two-dimensional space. A common technique for achieving this is principal component analysis (PCA), which reduces all variables to two main factors that can be subsequently plotted. The plot will then enable us to assess better whether clusters overlap or if we achieve a meaningful separation between the different identified clusters. R’s factoextra package offers various functions that achieve this with a single command, eliminating the need to code the PCA or plot.
#Plot Clusters for 4k solution
library(factoextra)
fviz_cluster(StrattonCluster_4k, BankinCRMData, ellipse.type = "norm") # object, original datafviz_cluster(StrattonCluster_4k, scaled.crm, ellipse.type = "norm") # object, scaled data; no difference observed. A quick inspection of the plot already reveals that our 4k cluster approach is not optimal, as we see some more separable groups of close-together observations. Especially in the case of Clusters 3 and 4 (the larger ones), it appears that we can still split these groups into two additional subgroups each.
Repeat the cluster analysis with k = 5 and k = 6. What can you tell about the results
Trying out different solutions may point you in the right direction, but you will soon realize that determining the optimal number of clusters can be challenging.
To find the “best” number of cluster, there are different approaches and measures available. Before we discuss these, let’s first reflect on what we want to achieve with a cluster analysis.
We want to obtain subgroups that are homogeneous within. So, to say we try to maximize within-group homogeneity means we try to reduce the level of variance between members of a cluster. The overall level of within-cluster variances across all identified clusters can thus be used to describe the total degree of homogeneity obtained with a specific cluster solution. This gives us the opportunity to compare different cluster analyses with varying numbers of clusters, allowing us to minimize the overall variance.
Using the within-cluster-variance values, we can determine which solution works best and then focus on this cluster analysis. To do so, we first estimate n cluster solutions with cluster numbers from 1 to k. Subsequently, we can then plot the within-cluster variance sums for each cluster solution.
Again, R can do this for us with some short lines of code. Below you find two measures for within-cluster variances. We can now ask R to estimate K-means models with k values from 2 to 15 and then plot the within variances of each solution. Don’t worry, if this takes some time.
#Obtain Elbow Plots to determine optimal k
factoextra::fviz_nbclust(na.omit(scaled.crm), kmeans, method = "wss", k.max = 15) #wss: within cluster sums of squares,The Elbow plot (1st plot), shows the total within sum of cluster variances for all estimated 15 solutions. The rule of thumb states that the optimal cluster number lays within the “elbow” of the plot. This seems to be rather tricky as the function drops immediately and shows very low summed variances for clusters 2 to 15. Therefore, we rely on a second method, the Silhouette plot.
The silhouette score for each data point is defined as:
\[ s(i) = \frac{b(i) - a(i)}{\max\{a(i), b(i)\}}\], where:
\(a(i)\) = average distance from the point to all other points in the same cluster (intra-cluster distance)
\(b(i)\) = average distance from the point to all points in the nearest other cluster (inter-cluster distance)
The silhouette score s(i)s(i)s(i) ranges from:
+1: the point is well-clustered
0: the point lies on the boundary between two clusters
–1: the point might be in the wrong cluster
You compute the average silhouette score over all data points for each value of k, and choose the k that maximizes this average.
#Obtain Silhouette Plots to determine optimal k
factoextra::fviz_nbclust(na.omit(scaled.crm), kmeans, method = "silhouette", k.max = 15)The silhouette coefficient measures how close an object is to its own cluster centroid, compared to the one of other clusters. The coefficient ranges from −1 to +1. High values indicate strong separation. Low values indicate poor separation. We thus want to select the cluster solution with the highest silhouette coefficient. In our case, the plot suggests 8 clusters. Looking again at the Elbow plot on the left, 8 seems rather high, especially as the “Elbow” – lays somewhere between 5 and 7. The silhouette plot suggests that the 7-cluster solution is inferior to the 6- and 8-cluster solutions. We may thus enrich our insights by plotting all three solutions with the following command.
The Gap Statistic compares the total within-cluster variation for different values of k with their expected values under a null reference distribution (i.e., data with no clustering structure).
It was introduced by (Tibshirani, Walther, and Hastie 2001) and works as follows:
\[ \text{Gap}(k) = \mathbb{E}[\log(W_k^*)] - \log(W_k))\]
Where:
\(W_k\): total within-cluster dispersion for your real data
\(W_k^*\): total within-cluster dispersion for reference data (generated from a uniform distribution)
\(\mathbb{E}[\log(W_k^*)]\): average log within-cluster dispersion for B reference datasets
The optimal number of clusters is the smallest k such that:
\[Gap(k)≥Gap(k+1)−s_{k+1}\]
Where \(s_{k+1}\) is the standard error of the gap statistic at \(k+1\).
```{r}
#| eval: false
library(cluster)
set.seed(123)
gap_stat <- clusGap(na.omit(scaled.crm),
FUNcluster = kmeans,
K.max = 9, B = 60)
print(gap_stat)
fviz_gap_stat(gap_stat)
```> print(gap_stat)
Clustering Gap statistic ["clusGap"] from call:
clusGap(x = na.omit(scaled.crm), FUNcluster = kmeans, K.max = 9, B = 60)
B=60 simulated reference sets, k = 1..9; spaceH0="scaledPCA"
--> Number of clusters (method 'firstSEmax', SE.factor=1): 8
logW E.logW gap SE.sim
[1,] 9.849855 10.24394 0.3940806 0.001610821
[2,] 9.649660 10.14771 0.4980528 0.001269681
[3,] 9.589105 10.12340 0.5342959 0.002450073
[4,] 9.355471 10.10131 0.7458421 0.002305038
[5,] 9.072338 10.08633 1.0139880 0.001651291
[6,] 9.003449 10.07295 1.0695044 0.001460224
[7,] 8.949475 10.06186 1.1123887 0.001367756
[8,] 8.903180 10.05149 1.1483141 0.001347131
[9,] 8.979300 10.04303 1.0637305 0.001284367
fviz_gap_stat(gap_stat)
Do the Gap Statistics analysis with 30 bootstrapped samples. Report the results and interpret the results.
| Method | Measures | Pros | Cons |
|---|---|---|---|
| Elbow | WCSS (within-cluster sum of squares) | Simple and fast | Subjective; elbow is often unclear |
| Silhouette | Separation and cohesion | More interpretable; good for cluster quality insight | Computationally heavier |
| Gap Statistics | WCSS vs. null distribution | Statistically grounded; includes standard error | Computationally expensive (especially with high B) |
The Gap Statistic is especially useful when you want a formal test-like approach to select k, as it incorporates a reference distribution and standard error.
It is more robust and objective than the elbow method.
It can outperform the silhouette method when the cluster structure is subtle or noisy.
But it is also slower, especially for large datasets.
If you’re doing a rigorous analysis and can afford the computing time, Gap Statistic is arguably the best overall; if you want speed and interpretability, Silhouette is an excellent alternative. Use all three methods if possible to triangulate the best k.
#Plot Cluster Solutions
#k6
set.seed(321)
StrattonCluster_6k <- kmeans(scaled.crm, 6)
fviz_cluster(StrattonCluster_6k, scaled.crm, ellipse.type = "norm")#k7
set.seed(321)
StrattonCluster_7k <- kmeans(scaled.crm, 7)
fviz_cluster(StrattonCluster_7k, scaled.crm, ellipse.type = "norm")#k8
set.seed(321)
StrattonCluster_8k <- kmeans(scaled.crm, 8)
fviz_cluster(StrattonCluster_8k, scaled.crm, ellipse.type = "norm")Let us first start by looking in more detail at our 8-cluster k-means model and see how big each cluster is, with the following code.
# 8 cluster k-means cluster size plot
sizes8k <- data.frame(Size = StrattonCluster_8k[["size"]],
Cluster = c("Cluster1", "Cluster2", "Cluster3", "Cluster4",
"Cluster5", "Cluster6", "Cluster7", "Cluster8"))
sizes8k |>
ggplot(aes(factor(Cluster), Size)) +
geom_col(fill=hcl(195, 100, 65)) +
xlab("Cluster") +
ylab("Size") +
geom_text(aes(label=Size), vjust=0) +
ggtitle("Cluster sizes k-means 8-cluster solution")To gain deeper insights into spending behavior and the digital affinity of different segments, we aim to plot the means of the various variables. To achieve this, we first assemble a descriptive data set with all variable means per cluster with the help of dplyr’s group_by function.
# Build Mean per Cluster DataFrame
BankinCRMData$k8Cluster = StrattonCluster_8k$cluster
BankinCRMData.means.percluster_8k = BankinCRMData |>
group_by(k8Cluster) |>
summarise_if(is.numeric, mean, na.rm = TRUE)
BankinCRMData.means.percluster_8kglimpse(BankinCRMData.means.percluster_8k)Rows: 8
Columns: 30
$ k8Cluster <int> 1, 2, 3, 4, 5, 6, 7, 8
$ Age <dbl> 57.96533, 25.68300, 23.09436, 37.35760, 41.87000…
$ Income <dbl> 180029.46, 90409.48, 37983.58, 74948.43, 67991.1…
$ HouseholdSize <dbl> 3.488000, 2.194000, 3.635214, 2.113600, 3.441000…
$ CityAreaSize <dbl> 120068.7, 132506.0, 690126.4, 453890.4, 450057.9…
$ MeanCityIncome <dbl> 139990.5, 115687.9, 250170.2, 235000.0, 109774.5…
$ MeanCityHousePrize <dbl> 620012.3, 369396.5, 1849960.6, 450105.5, 131557.…
$ MeanCityHouseHoldSize <dbl> 3.366667, 2.533333, 2.276265, 2.018400, 4.144000…
$ MeanCitySqFtPrice <dbl> 6497.381, 3458.722, 8594.622, 6193.529, 2290.453…
$ NumberCars <dbl> 2.7700000, 1.3866667, 0.8939689, 0.3616000, 1.35…
$ InternetTrafficVolume <dbl> 15.01533, 50.01333, 84.94163, 115.01680, 114.993…
$ MortageVolume <dbl> 149966.17, 318868.28, 14989.43, 250262.26, 12479…
$ AccountSpending <dbl> 3501.7735, 1097.0318, 551.9743, 796.5764, 1296.4…
$ CreditCardSpending <dbl> 1900.2100, 1155.2976, 649.3741, 549.8931, 750.66…
$ HelpHotlineTime <dbl> 34.911152, 15.795097, 4.358455, 10.275851, 8.287…
$ CustomerSince <dbl> 65.082000, 10.751333, 2.958171, 2.516000, 12.486…
$ GrocerySpending <dbl> 1200.1166, 474.9574, 264.9690, 421.9176, 650.101…
$ StockVolume <dbl> 2393.6115, 1850.4302, 649.3551, 5499.0369, 3501.…
$ CreditVolume <dbl> 149.3357, 249.7797, 2499.1785, 3500.3457, 1500.7…
$ NASDAQInvest <dbl> 3003.1411, 925.6698, 400.1223, 4499.8338, 3499.9…
$ USAXSFundInvest <dbl> 100.0340, 900.1365, 150.3297, 1049.4645, 2897.03…
$ BranchVisits <dbl> 9.985333, 4.506333, 2.291829, 1.610400, 1.146000…
$ AppLogins <dbl> 10.02533, 54.98833, 65.09241, 114.90560, 90.1300…
$ ATMVisits <dbl> 4.036000, 2.775333, 6.167315, 3.189600, 7.103000…
$ TimeOnlineBanking <dbl> 30.03605, 137.57997, 84.98368, 224.96318, 175.13…
$ ServiceFees <dbl> 44.87447, 15.09119, 24.97558, 109.95768, 75.2831…
$ SocialMediaInter <dbl> 2.502667, 5.498333, 16.380350, 55.420800, 35.333…
$ Bitcoins <dbl> 0.0001021333, 0.2002524667, 0.1000003891, 0.4999…
$ NFTs <dbl> 1.000000, 2.007000, 3.023346, 9.004800, 6.472000…
$ k4Cluster <dbl> 3.000000, 3.501333, 4.000000, 2.000000, 1.000000…
We can now generate bar plots of the different variables of interest and see if we find promising segments of Stratton & Files customers who might be open to and suitable for Stratton AE Banking. Let us first focus on spending behavior, as indicated by the service fee variable. Note that we adapted some of the commands in ggplot. By leaving geom_col() blank we do not specify a color and the plot remains in grey. In addition, we ask ggplot in geom_text to add labels with the two-digit rounded values of ServiceFees in white color and in font size 2. With the position_stack command we put the values in the middle of the barplot.
#Barplot of Service Fees
BankinCRMData.means.percluster_8k |>
ggplot(aes(factor(k8Cluster), ServiceFees)) +
geom_col() +
geom_text(aes(label = round(ServiceFees, digits = 2)),
size = 4, colour = "white",
position = position_stack(vjust = 0.5)) +
labs(x = "Clusters",
y = "Extra Fees paid for banking services",
title = "Average Spending in Service Fees per Cluster")A visual inspection indicates that clusters 4 and 5 show the highest spending behavior, with clusters 1 and 8 following, while the remaining clusters show rather low service fee spending. This makes at least the four high-spending segments attractive for AE Banking.
However, to be sure that the rather novel and highly digital app service appeals to these segments, we need to understand how digitally active and interested these segments are.
Let us first focus on the latest developments in fintech such as Bitcoin and NFT investments. We can again compare the segment-specific means for both variables. This time we want to combine the plots of Bitcoins and NFTs in one plot. We can arrange this with ggplot’s facet_wrap function that allows us to combine plots of different variables. The only “complication” we need to address is that we need to rearrange the dataset we want to plot. We can again use dplyr for this.
We first select the variables of interest (cluster, NFTs and Bitcoins) and then transpose the data frame from a wide to a long format. We can then use ggplot again. This time we use the geom_bar command instead of the geom_col command. Facet_wrap will now tell ggplot to make two plots and combine them under each other (col =1). By setting scales to “free_y” we allow different y-axis levels, given that scales substantially vary across the two different variables.
#Barplots of Fintech Investments
FinTech <- BankinCRMData.means.percluster_8k |>
select(k8Cluster, NFTs, Bitcoins) |>
pivot_longer(cols = -k8Cluster,
names_to = "variable",
values_to = "value")
ggplot(FinTech, aes(factor(k8Cluster), value))+
geom_bar(stat='identity') +
xlab("Clusters") +
facet_wrap(~variable, ncol=1, scales = "free_y") +
geom_text(aes(label = round(value, digits = 1)),
size = 4,
colour = "white",
position = position_stack(vjust = 0.5)) +
ggtitle("FinTech Cluster Means")Let us now look at digital activities and compare digital and offline activities.
With the following code, we can inspect the means for
As you can see from facet_wrap, we now include two columns.
#Plots for Digital vs. Offline Life
DigLife = BankinCRMData.means.percluster_8k %>%
select(k8Cluster, BranchVisits, AppLogins,
ATMVisits, TimeOnlineBanking, SocialMediaInter,
InternetTrafficVolume) %>%
pivot_longer(cols = -k8Cluster,
names_to = "variable",
values_to = "value")
ggplot(DigLife, aes(factor(k8Cluster), value))+
geom_bar(stat='identity') + xlab("Clusters") +
facet_wrap(~variable, ncol=2, scales = "free_y") +
geom_text(aes(label = round(value, digits = 1)), size = 4, colour = "white",
position = position_stack(vjust = 0.5)) +
ggtitle("Digital Life vs. Offline Life Cluster Means")Let us first focus on average age, income, and household sizes.
#Plots for Spending and Investments
Invest <- BankinCRMData.means.percluster_8k |>
select(k8Cluster, MortageVolume, StockVolume,
NASDAQInvest, USAXSFundInvest)|>
pivot_longer(-k8Cluster, names_to = "variable", values_to = "value")
ggplot(Invest, aes(factor(k8Cluster), value))+
geom_bar(stat='identity') + xlab("Clusters") +
facet_wrap(~variable, ncol=2, scales = "free_y") +
geom_text(aes(label = round(value, digits = 1)),
size = 2, colour = "white",
position = position_stack(vjust = 0.5)) +
ggtitle("Investment Cluster Means")Spending <- BankinCRMData.means.percluster_8k |>
select(k8Cluster, AccountSpending,
CreditCardSpending, GrocerySpending) |>
gather(key = "variable", value = "value", -k8Cluster)
ggplot(Spending, aes(factor(k8Cluster), value))+
geom_bar(stat='identity') + xlab("Clusters") +
facet_wrap(~variable, ncol=1, scales = "free_y") +
geom_text(aes(label = round(value, digits = 1)),
size = 3, colour = "white",
position = position_stack(vjust = 0.5)) +
ggtitle("Spending Cluster Means")Investment:
From the inspection of the two plots, it becomes evident that clusters 4 and 5 are more invested in stocks than their counterparts, and compared to the other clusters also share lower levels of mortgages.
Looking at the types of investments, we see that the cluster 4 is more invested in NASDAQ-listed companies than all other clusters, while cluster 6 is strongly invested in Stratton’s fund for small and mid-size US companies.
Spending:
Spending behavior information indicates that both segments comprise fewer spending customers with cluster 4 showing the lowest credit card turnover of all clusters.
In the case of grocery expenditures, we see cluster 5 being the cluster with the second-highest average spending behaivor.
Last, we can enrich our insights, by looking at the living conditions of the different segments and see where the different segments are located. To achieve this, we finally compare residential information.
#Plots Residential Information
Life <- BankinCRMData.means.percluster_8k |>
select(k8Cluster, CityAreaSize, MeanCitySqFtPrice,
MeanCityHouseHoldSize, MeanCityIncome) |>
gather(key = "variable", value = "value", -k8Cluster)
ggplot(Life, aes(factor(k8Cluster), value))+
geom_bar(stat='identity') + xlab("Clusters") +
facet_wrap(~variable, ncol=2, scales = "free_y") +
geom_text(aes(label = round(value, digits = 1)),
size = 2, colour = "white",
position = position_stack(vjust = 0.5)) +
ggtitle("Life Conditions Cluster Means")From the plot we learn that clusters 4 and 5 both prefer city areas with mid-to-high levels of population.
In case of cluster 4, the average household sizes in the residential areas are rather small, while in case of cluster 5 we observe larger compounds with an average 4 members living in one household.
Looking at income distributions and the area’s soil values, we learn that cluster 4 lives in rather richer neighborhoods with higher soil prices, whereas cluster 5 members prefer middle-class neighborhoods with affordable, low soil prices.
Combining the information at hand, how do you depict members of clusters 4 and 5 and how do you believe they differ from each other?
Can you also develop personas for the other clusters?
The results of the cluster analysis allow Stratton AE Banking to take several important marketing actions. First, the profound understanding of the different available market segments allows the joint venture to understand the various types of customers available and to determine which segments in the existing customer base should be built upon as the base for future marketing activities.
To develop suitable positioning strategies for each cluster and subsequently develop communication campaigns, one can utilize the additional insights gained from the cluster analysis and the comparison of the cluster-specific means of the remaining variables.
Furthermore, the results of the cluster analysis can be used to also predict the interests and preferences of newly incoming customers. Here, one may use the existing information available and calculate the Euclidean distances between the new customer and the centers (i.e. the means of each dimension) of each cluster. The customer will likely belong to the cluster, with the lowest distance.
8.3 Social Economics by 8 clusters
Code
The plots reveal the problems with socio-economic clustering, as the results for age and household size do not vary too much across the 8 clusters.
We see some variation for income, where clusters 4 and 5 remain close to the total mean of the dataset, indicating that the digital-affine users identified are neither poor nor rich, making them still a suitable target group.
Age-wise, we similarly see that both segments are well-established adults in their end 30s or early 40s.
Given that the socio-economic information indicates that the digital-affine users profit from stable incomes, we should in the next steps focus on spending and investment behavior to understand whether these segments allow sufficient business volume and growth potential.